This repository contains the replication materials for:

Wouter van Atteveldt, Tamir Sheafer, Shaul R. Shenhav, and Yair Fogel-Dror
Clause analysis:
Using syntactic information to automatically extract source, subject, and predicate from texts with an application to the 2008-2009 Gaza War

To be published in Political Analysis

All files in this repository are (C) Wouter van Atteveldt, licensed CC0 (public domain)
Contact: Wouter van Atteveldt, wouter@vanatteveldt.com

* Description

The article presents three analyses: a gold-standard based validation
of the clause analysis method, a substantive application of the method
to the Gaza war coverage, and the application of the baseline method
to the same case (published in the online appendix).

This repository contains the materials needed to replicate these
analyses. For each analysis there is an R file which recreates the
figures and/or tables as presented in the article. These analyses
depend on the R library published for the method. To get exactly the
same results as the article, you should use the package as published
on github under http://github.com/anon-author/clauses.  For using the
method in your own research, please use the updated version at
http://github.com/vanatteveldt/rsyntax.

* Note on Copyright:

The substantive analyses as presented in section 5 are based on
newspaper material.  To reproduce these figures from scratch, the raw
or parsed texts from those articles are needed.  For copyright
reasons, these materials cannot be published openly.

In this repository, we included the raw output of the clause analyis
on these articles, and the code to create the tables and figures from
this output. Moreover, we included the R code to fetch the parsed
texts from our AmCAT server and process them.  To execute this code,
an authorized account on amcat.nl is required.

* Files included:

R scripts:
validation.r - reproduces the validation presented in section 4 ("Validation")
substantive.r - reproduces the substantive analysis presented in section 5 ("Substantive Use Case")
baseline.r - reproduces the  baseline results for the substantive analysis presented in the online appendix
get_tokens.r - code to recreate the clauses.rds and quotes.rds files (requires AmCAT access)
function.r - Auxilliary functions shared between the different R files

Data files:
quotes.rds - Contains the quotes information per token
clauses.rds - contains the clause information per token (including source and lemma)
entities.rds - Contains a list of named entities (for creating the quote network graphs)
gold_coding.rds - The manual coding of the validation set (gold standard)
gold_tokens.rds - The parsed text of the validation set
